-
Notifications
You must be signed in to change notification settings - Fork 323
refactor(multipath): move discovery into EndpointStateActor
#3645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3645/docs/iroh/ Last updated: 2025-11-18T15:06:00Z |
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess most tests needs to be moved the the StaticProvider?
I don't quite follow the WantConnect stuff. Is this trying to make sure you can return an error straight from calling Endpoint::connect when you don't have any discovery results? As otherwise you'd start connecting and then the connection would time out?
iroh/src/magicsock.rs
Outdated
| // Prune our own addreses from the endpoint address. | ||
| // TODO: Move this somewhere else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please!
The EndpointStateActor should collect these addresses as usual but when choosing addresses to send to when handling SendDatagram it should filter out any that are itself. That would mean the possible remote addresses are not lost when you move network. The EndpointStateActor already has access to the local DirectAddrs so this is way more logical.
Yes, exactly. It mimics the current behavior that Edit: I renamed it to |
6189b7f to
196d89a
Compare
matheus23
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused about the local_addrs filtering. Is there some situation in particular that you saw that meant you had to add filtering?
Ideally none of our discovery services will accidentally discover local addresses.
And perhaps ideally the endpoint doesn't break too much when these addresses do end up in our EndpointStateActor::paths, right?
I did not add it, only moved it around. It lives here on |
|
Given that check used to happen in |
This sounds good to me, but I'd like to hear @flub's opinion too. Edit: I remember now why I moved it to |
|
|
||
| # test_utils | ||
| axum = { version = "0.8", optional = true } | ||
| sync_wrapper = { version = "1.0.2", features = ["futures"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 another dependency just for making things sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another? which one is the other one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add it to n0-futures though. The whole crate is <300 lines.
https://docs.rs/sync_wrapper/latest/src/sync_wrapper/lib.rs.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another, to the many dependencies we already have 😓
My suggestion would be to update It's a little bit more bookkeeping, but avoids filtering in the |
| .or_default() | ||
| .sources | ||
| .insert(Source::Connection { _0: Private }, Instant::now()); | ||
| self.add_path_entry(path_remote, Source::Connection { _0: Private }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I'm a fan of having a helper function for this. It hides what state we're modifying. A self.paths.add_entry() would be much more amenable. I kind of think we should modify local state directly, otherwise we're building something too complex and need to tweak the abstractions.
As a similar example I had several iterations of pretty horrible state management issues before I ended up finding ConnectionState in its current form, which ended up being much more maintainable (and may need to evolve further at some point, this is by no ways some perfect abstraction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've made it all the way to the end of the endpoint_state.rs changes and finally have managed to come up with a suggestion. You might encounter a few more comments on this topic. Not 100% sure it's the right one as I haven't tried but anyway.
I think the complexity I have issue with comes from that you need to respond to the ResolveRemote requests at two points:
- whenever
self.pathschanges - whenever discovery found something
And then you end up with a brittle combination of mutating state and calling hooks at the right points all over the place.
Can this not be moved into self.paths? Maybe path_state.rs can redeem itself for still being it's own module 😉 (currently a historical accident). If we had an EndpointPathState (bikeshedding welcome) that you passed the oneshot senders into. And that way you can have a clear abstraction boundary for paths being added and it can emit change responses whenever we get a new remote transport address for whatever reason.
Would that go a long way to reducing the brittleness and complexity that's being added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would indeed be a better structure likely, yes.
I implemented this, see the latest commit on this branch.
| .or_default() | ||
| .sources | ||
| .insert(source.clone(), Instant::now()); | ||
| /// Adds new [`TransportAddr`] addresses to our list of potential paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add what this really does. Because appending to a list does not require a function. What magic happens?
| name: item.provenance().to_string(), | ||
| }; | ||
| let addr = item.into_endpoint_addr(); | ||
| self.add_addrs(addr.addrs, source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not calling self.emit_pending_resolve_requests here is a red flag for state management to me. Because the other branches have this. It makes things complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now goes over the EndpointPathState, lmk if you still think the interactions are not clear enough.
|
|
||
| /// Error returned when the endpoint state actor stopped while waiting for a reply. | ||
| #[stack_error(derive)] | ||
| #[stack_error(add_meta, derive)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh! TIL why things where the shape they were before.
bbb3d27 to
b9aceab
Compare
162c7c2 to
147d1a0
Compare
b02df8a to
15d4cf4
Compare
e8178b4 to
ed14a71
Compare
| /// Notifies that a discovery run has finished. | ||
| /// | ||
| /// This will emit pending resolve requests. | ||
| pub(super) fn discovery_finished(&mut self, error: Option<DiscoveryError>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, but doesn't res: Result<(), DiscoveryError> make more sense?
| scheduled_open_path: None, | ||
| pending_open_paths: VecDeque::new(), | ||
| sender, | ||
| discovery, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So definitely nitpicking. I alsoways liked: https://rust-lang.github.io/rust-clippy/stable/index.html#inconsistent_struct_constructor
But as the lint is not enabled I'm sure you get to choose what you do :)
Description
Fixes #3642
This moves discovery handling fully into the
EndpointStateActor.The pub(crate) interface to trigger discovery and get a EndpointMappedAddr is now
Magicsock::resolve_remote, which sends the provided addresses to the EndpointStateActor. The actor starts discovery if it does not have a selected path and if discovery is not running. It returns either immediately if there are any known paths, or waits for discovery to produce at least one result or an error. Once this returns,resolve_remotereturns either with a EndpointMappedAddr or with the discovery error.This means the current behavior is kept: We only start
quinn::Endpoint::connectonce we have at least one transport address for the remote. If not, we return the discovery error immediately fromiroh::Endpoint::connect.This opens the door for us to easily tune when to run discovery in other siutations, e.g. when all available paths to a remote are closed. However, for now this PR still only starts discovery when
Endpoint::connectis called and no path is selected at the moment.Breaking Changes
Notes & open questions
Change checklist
quic-rpciroh-gossipiroh-blobsdumbpipesendme